Developer(s) | Intel |
---|---|
Stable release | 12.1 / September 8, 2011 |
Operating system | Linux, Microsoft Windows and Mac OS X |
Type | Compiler |
License | Proprietary |
Website | http://software.intel.com/en-us/intel-compilers/ |
Intel C++ Compiler (also known as icc or icl) is a group of C and C++ compilers from Intel Corporation available for GNU/Linux, Mac OS X, and Microsoft Windows.
Intel supports compilation for its IA-32 and Intel 64 processors and certain non-Intel but compatible processors, such as AMD processors. Developers should check system requirements. The Intel C++ Compiler for IA-32 and Intel 64 features an automatic vectorizer that can generate SSE, SSE2, SSE3, SSSE3, SSE4 and AVX SIMD instructions, the embedded variant for Intel Wireless MMX and MMX 2.[1] Since its introduction, the Intel C++ Compiler for IA-32 has greatly increased adoption of SSE2 in Windows application development.
Intel C++ Compiler further supports both OpenMP 3.1 and automatic parallelization for symmetric multiprocessing. With the add-on capability Cluster OpenMP, the compiler can also automatically generate Message Passing Interface calls for distributed memory multiprocessing from OpenMP directives.
Intel C++ Compiler belongs to the family of compilers with the Edison Design Group frontend (like the SGI MIPSpro, Comeau C++, Portland Group, and others). The compiler is also notable for being widely used for SPEC CPU Benchmarks of IA-32, x86-64, and Itanium 2 architectures.
The Intel C++ Compiler is available in various packages from Intel including Intel Parallel Studio, Intel Parallel Studio XE, the Intel C++ Composer package, the Intel C++ Composer XE package, the Intel Composer XE package and the Intel Cluster Studio. The Intel Software Products site provides more information.
Contents |
Intel tunes its compilers to optimize for its hardware platforms to minimize stalls and to produce code that executes in the fewest number of cycles. The Intel C++ Compiler supports three separate high-level techniques for optimizing the compiled program: interprocedural optimization (IPO), profile-guided optimization (PGO), and high-level optimizations (HLO). It also supports tools and techniques for adding and maintaining parallelism to applications.
Profile-guided optimization refers to a mode of optimization where the compiler is able to access data from a sample run of the program across a representative input set. The data would indicate which areas of the program are executed more frequently, and which areas are executed less frequently. All optimizations benefit from profile-guided feedback because they are less reliant on heuristics when making compilation decisions.
High-level optimizations are optimizations performed on a version of the program that more closely represents the source code. This includes loop interchange, loop fusion, loop unrolling, loop distribution, data prefetch, and more.[2] These optimizations are usually very aggressive and may take considerable compilation time.
Interprocedural optimization applies typical compiler optimizations (such as constant propagation) but using a broader scope that may include multiple procedures, multiple files, or the entire program.[3]
The compilers include a parallel debugger extension, Intel Threading Building Blocks, lambda function support, and a source checker tool for use with threaded code.
Intel's compiler has been criticized for applying, by default, floating-point optimizations not allowed by the C standard and that require special flags with other compilers such as gcc.[4]
Intel's suite of compilers has front ends for C, C++, and Fortran.
Early versions of ICC for Linux that predate GCC 3.x use the Dinkumware name mangling scheme in order to provide a more standard implementation of C++ than GCC 2.x. This made its ABI incompatible with both GCC versions. Intel removed the Dinkumware libraries in the 10.0 release (June 2007). Since then, the compiler has been and remains compatible with GCC 3.2 and later.
The following versions of Intel C++ Compiler have been released:
Compiler version | Release date | Major New Features |
---|---|---|
Intel C++ Composer XE 2011 Update 6 and above (compiler 12.1) | September 8, 2011 | Intel Cilk Plus language extensions updated to support specification version 1.1 and available on Mac OS X in addition to Windows and Linux, Intel Threading Building Blocks updated to support version 4.0, Apple blocks supported on Mac OS X, improved C++11 support including support for Variadic templates, OpenMP 3.1 support.[5] |
Intel C++ Composer XE 2011 up to Update 5 (compiler 12.0) | Nov 7, 2010 | Intel Cilk Plus language extensions, Guided Auto-Parallelism, Improved C++11 support.[6] |
Intel C++ Compiler 11.1 | June 23, 2009 | Support for latest Intel SSE SSE4.2, AVX and AES instructions. Parallel Debugger Extension. Improved integration into Microsoft Visual Studio, Eclipse CDT 5.0 and Mac Xcode IDE. |
Intel C++ Compiler 11.0 | November 2008 | Initial C++11 support [1]. VS2008 IDE integration on Windows. OpenMP 3.0. Source Checker for static memory/parallel diagnostics. |
Intel C++ Compiler 10.1 | November 7, 2007 | New OpenMP* compatibility runtime library: if you use the new OpenMP RTL, you can mix and match with libraries and objects built by Visual C++. To use the new libraries, you need to use the new option "-Qopenmp /Qopenmp-lib:compat" on Windows, and "-openmp -openmp-lib:compat" on Linux. This version of the Intel compiler supports more intrinsics from Visual Studio 2005.
VS2008 support - command line only in this release. The IDE integration was not supported yet. |
Intel C++ Compiler 10.0 | June 5, 2007[7] | Improved parallelizer and vectorizer, Streaming SIMD Extensions 4 (SSE4), new and enhanced optimization reports for advanced loop transformations, new optimized exception handling implementation. |
Intel C++ Compiler 9.0 | June 14, 2005[8] | AMD64 architecture (for Windows), software-based speculative pre-computation (SSP) optimization, improved loop optimization reports.[9][10] |
Intel C++ Compiler 8.1 | September, 2004 | AMD64 architecture (for Linux).[11][12] |
Intel C++ Compiler 8.0 | December 15, 2003[13] | Precompiled headers, code-coverage tools. [2] |
Intel C++ Compiler 7.1 | March, 2003 | Partial support for the Intel Pentium 4 with Streaming SIMD Extensions 3 (SSE3). [3] |
Intel C++ Compiler 7.0 | November 25, 2002[14] | [4] |
Intel C++ Compiler 6.0 | April 24, 2002[15] | [5] |
In addition, the following "prototype" editions have been made available:
Compiler version | Release date | Major New Features |
---|---|---|
Intel STM Compiler Prototype Edition | September 17, 2007[16] | Prototype version of the Intel compiler that implements support for Software Transactional Memory (STM). The Intel STM Compiler supports Linux and Windows, producing 32 bit code for x86 (Intel and AMD) processors. Intel stated the belief that "The availability of such a prototype compiler allows unprecedented exploration by C / C++ software developers of a promising technique to make programming for multi-core easier." The STM compiler requires that you already have the Intel compiler installed. |
Intel Concurrent Collections for C/C++ 0.3 | September, 2008 | Intel Concurrent Collections for C/C++ provides a mechanism for constructing C++ programs that execute in parallel. It allows developers to ignore issues of parallelism such as low-level threading constructs or scheduling/distribution of computations. The model allows developers to specify high-level computational steps including inputs and outputs without imposing unnecessary ordering on their execution. Code within the computational steps is written using standard serial constructs of the C++ language. Data is either local to a computational step or it is explicitly produced and consumed by them. It supports multiple styles of parallelism (e.g., data, task, pipeline parallel). |
Documentation can be found at the Intel Software Technical Documentation site.
Windows | Linux & MacOSX | Comment |
---|---|---|
/Od | -O0 | No optimization |
/O1 | -O1 | Optimize for size |
/O2 | -O2 | Optimize for speed and enable some optimization |
/O3 | -O3 | Enable all optimizations as O2, and intensive loop optimizations |
/QxO | -xO | Enables SSE3, SSE2 and SSE instruction sets optimizations for non-Intel CPUs [17] |
/fast | -fast | Shorthand. On Windows this equates to "/O3 /Qipo /QxHost /no-prec-div" ; on Linux "-O3 -ipo -static -xHOST -no-prec-div". Note that the processor specific optimization flag (-xHOST) will optimize for the processor compiled on—it is the only flag of -fast, which may be overridden. |
/Qprof-gen | -prof_gen | Compile the program and instrument it for a profile generating run. |
/Qprof-use | -prof_use | May only be used after running a program that was previously compiled using prof_gen. Uses profile information during each step of the compilation process. |
The Intel compiler provides debugging information that is standard for the common debuggers (DWARF 2 on Linux, similar to gdb, and COFF for Windows). The flags to compile with debugging information are /Zi on Windows and -g on Linux.
Intel also provides its own debugger called idb, which can be run in both dbx and gdb compatible command mode.
While the Intel compiler can generate a gprof compatible profiling output, Intel also provides a kernel level, system-wide statistical profiler as a separate product called VTune. VTune features an easy-to-use GUI (integrated into Visual Studio for Windows, Eclipse for Linux) as well as a command line interface.
The 11.x releases of the compiler introduced the Parallel Debugger Extension, which provides techniques for debugging threaded applications. It can be used with other, compatible compilers, such as Microsoft Visual C++ on Windows as available in Visual Studio 2005 and 2008 and gcc on Linux. Visual Studio 2010 support was added by the 12.x releases.
Intel and third parties have published benchmark results to substantiate performance leadership claims over other commercial, open source and AMD compilers and libraries on Intel and non-Intel processors. Intel and AMD have documented flags to use on the Intel compilers to get optimal performance on Intel and AMD processors.[18][19] Nevertheless, the Intel compilers have been accused of producing sub-optimal code with mercenary intent. For example, Steve Westfield wrote in a 2005 article at the AMD website:[20]
“ | Intel 8.1 C/C++ compiler uses the flag -xN (for Linux) or -QxN (for Windows) to take advantage of the SSE2 extensions. For SSE3, the compiler switch is -xP (for Linux) and -QxP (for Windows). [...] With the -xN/-QxN and -xP/-QxP flags set, it checks the processor vendor string—and if it's not "GenuineIntel," it stops execution without even checking the feature flags. Ouch! | ” |
The Danish developer and scholar Agner Fog wrote in 2009:[21]
“ | The Intel compiler and several different Intel function libraries have suboptimal performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string is "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version. | ” |
This vendor-specific CPU dispatching decreases the performance on non-Intel processors of software built with an Intel compiler or an Intel function library - possibly without the knowledge of the programmer. This has allegedly led to misleading benchmarks.[21] A legal battle between AMD and Intel over this and other issues has been settled in November 2009.[22] In late 2010, AMD settled an US Federal Trade Commission antitrust investigation against Intel.[23]
The FTC settlement included a disclosure provision where Intel must:[24]:
“ | ...publish clearly that its compiler discriminates against non-Intel processors (such as AMD's designs), not fully utilizing their features and producing inferior code. | ” |
In compliance with this rule, Intel added an "optimization notice" to its compiler descriptions stating that they "do not optimize equally for non-Intel microprocessors" and that "certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors". It says that:[25]
“ | Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. | ” |